Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving reading and recreating fit pseudodata #1328

Merged
merged 20 commits into from
Sep 15, 2021
Merged

Conversation

siranipour
Copy link
Contributor

@siranipour siranipour commented Jul 14, 2021

Now that python is used to construct the pseudodata, I've updated the pseudodata.py function that regenerates the pseudodata of a given fit. It's now much easier to do this in parallel because we don't use swig objects anymore.

Closes #1323

The tests will, however, fail for now until I regenerate a PDF that saves its python pseudodata for the regression test. @wilsonmr, I may be misremembering but did you at some point have a PR which saved the pseudodata of a fit?

EDIT: This has turned into a bit of a general improvements PR.

  1. Corrected the recreation of a fit pseudodata. I've swapped the C++ make_replica with the python one.
  2. Improved the reading of pseudodata from a fit which has fitting::savepseudodata: True
  3. There was a function that created the tr_mask for all replicas:
    def training_mask_table(replicas_exps_tr_masks, replicas, experiments_index):

    I've now changed this so it works on an individual replica and then collected it over replicas (I did this because it was useful for the recreation of a training validation mask.
  4. Fixed the tests

@siranipour siranipour marked this pull request as draft July 14, 2021 12:30
@wilsonmr
Copy link
Contributor

now that the replica generation is fast, can we not just call the action on a set of replicas? Is there really that much speed up from MP now? In fact since the action is in validphys, can we not just call --parallel in the usual way to get the MP?

@wilsonmr
Copy link
Contributor

@wilsonmr, I may be misremembering but did you at some point have a PR which saved the pseudodata of a fit?

This got added in #1081 I think with the savepseudodata flag:

if file_content["fitting"].get("savepseudodata"):

@wilsonmr
Copy link
Contributor

in particular, calling the API inside of a validphys action seems really horrible, I don't think this should be an internal action, or even exist at all

@siranipour
Copy link
Contributor Author

Oh true actually, let me see if the --parallel flag works

@wilsonmr
Copy link
Contributor

just to say, the function which I mentioned, pseudodata_table, was actually not updated to use make_replica in #1268 which is why the function is so painfully slow, it also means that function is currently useless because it's not even dumping the right pseudodata during the fit haha

@wilsonmr
Copy link
Contributor

how long does this take now with the collect approach @siranipour ?

@siranipour
Copy link
Contributor Author

just to say, the function which I mentioned, pseudodata_table, was actually not updated to use make_replica in #1268 which is why the function is so painfully slow, it also means that function is currently useless because it's not even dumping the right pseudodata during the fit haha

Ah ok haha, I'll try and get another PR for that then because the tests will fail until we regenerate a new regression fit.

how long does this take now with the collect approach @siranipour ?

Unfortunately I was a bit pushed for time, so I didn't get to fully benchmark it, but it still took a non negligible O(few minutes). I'll take a proper look and also play with the parallel flag and see what happens.

@siranipour siranipour changed the title Recreating python psueododata Recreating a fit's python psueododata Jul 21, 2021
@siranipour siranipour changed the title Recreating a fit's python psueododata Improving reading and recreating fit pseudodata Jul 21, 2021
@siranipour siranipour marked this pull request as ready for review July 21, 2021 16:12
@siranipour siranipour added the run-fit-bot Starts fit bot from a PR. label Jul 21, 2021
@github-actions
Copy link

Greetings from your nice fit 🤖 !
I have good news for you, I just finished my tasks:

Check the report carefully, and please buy me a ☕ , or better, a GPU 😉!

@siranipour
Copy link
Contributor Author

Any ideas why the regression tests fail? They pass locally for me...

@siranipour siranipour force-pushed the replicate_pseudodata branch 4 times, most recently from 0df61c3 to 2c49d34 Compare July 27, 2021 09:14
@siranipour
Copy link
Contributor Author

Ahh I generated the new baseline with theory 200 which is why the tests are failing. Will fix

@scarlehoff scarlehoff removed the run-fit-bot Starts fit bot from a PR. label Sep 1, 2021
@siranipour
Copy link
Contributor Author

@scarlehoff this PR flies in tandem with #1333, in that it reproduces the pseudodata a posteriori which is in principle the ones that are saved by #1333. I would appreciate if you could take a look at this soon too.

The basic idea is that I've added actions that work on a per replica basis. I.e replicate the pseudodata of replica_i. We then collect over either NSList(range(nrep), nskey='replica') to replicate all replicas, or produce_pdfreplicas, which is basically the same thing but accounting for the postfit reshuffling

validphys2/src/validphys/n3fit_data.py Show resolved Hide resolved
validphys2/src/validphys/config.py Show resolved Hide resolved
validphys2/src/validphys/pseudodata.py Outdated Show resolved Hide resolved
validphys2/src/validphys/pseudodata.py Outdated Show resolved Hide resolved
siranipour and others added 2 commits September 14, 2021 17:19
Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>
@siranipour siranipour merged commit b7e1e14 into master Sep 15, 2021
@siranipour siranipour deleted the replicate_pseudodata branch September 15, 2021 15:18
@Zaharid Zaharid added the enhancement New feature or request label Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

should pseudata.py should use fitting_data_dict?
4 participants